AITopics | traditional metric

Collaborating Authors

traditional metric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GeoPTH: A Lightweight Approach to Category-Based Trajectory Retrieval via Geometric Prototype Trajectory Hashing

Xu, Yang, Yang, Zuliang, Ting, Kai Ming

arXiv.org Artificial IntelligenceNov-24-2025

Trajectory similarity retrieval is an important part of spatiotemporal data mining, however, existing methods have the following limitations: traditional metrics are computationally expensive, while learning-based methods suffer from substantial training costs and potential instability. This paper addresses these problems by proposing Geometric Prototype Trajectory Hashing (GeoPTH), a novel, lightweight, and non-learning framework for efficient category-based trajectory retrieval. GeoPTH constructs data-dependent hash functions by using representative trajectory prototypes, i.e., small point sets preserving geometric characteristics, as anchors. The hashing process is efficient, which involves mapping a new trajectory to its closest prototype via a robust, Hausdorff metric. Extensive experiments show that GeoPTH's retrieval accuracy is highly competitive with both traditional metrics and state-of-the-art learning methods, and it significantly outperforms binary codes generated through simple binarization of the learned embeddings. Critically, GeoPTH consistently outperforms all competitors in terms of efficiency. Our work demonstrates that a lightweight, prototype-centric approach offers a practical and powerful alternative, achieving an exceptional retrieval performance and computational efficiency.

data mining, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2511.16258

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Evaluating the Evaluators: Are readability metrics good measures of readability?

Cachola, Isabel, Khashabi, Daniel, Dredze, Mark

arXiv.org Artificial IntelligenceAug-27-2025

Plain Language Summarization (PLS) aims to distill complex documents into accessible summaries for non-expert audiences. In this paper, we conduct a thorough survey of PLS literature, and identify that the current standard practice for readability evaluation is to use traditional readability metrics, such as Flesch-Kincaid Grade Level (FKGL). However, despite proven utility in other fields, these metrics have not been compared to human readability judgments in PLS. We evaluate 8 readability metrics and show that most correlate poorly with human judgments, including the most popular metric, FKGL. We then show that Language Models (LMs) are better judges of readability, with the best-performing model achieving a Pearson correlation of 0.56 with human judgments. Extending our analysis to PLS datasets, which contain summaries aimed at non-expert audiences, we find that LMs better capture deeper measures of readability, such as required background knowledge, and lead to different conclusions than the traditional metrics. Based on these findings, we offer recommendations for best practices in the evaluation of plain language summaries. We release our analysis code and survey data.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.19221

Country:

Asia > Middle East > UAE (0.46)
North America > United States > Minnesota (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.46)
Energy > Renewable (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework

Hu, Xiangcheng, Wu, Jin, Jia, Mingkai, Yan, Hongyu, Jiang, Yi, Jiang, Binqian, Zhang, Wei, He, Wei, Tan, Ping

arXiv.org Artificial IntelligenceNov-26-2024

Evaluating massive-scale point cloud maps in Simultaneous Localization and Mapping (SLAM) remains challenging, primarily due to the absence of unified, robust and efficient evaluation frameworks. We present MapEval, an open-source framework for comprehensive quality assessment of point cloud maps, specifically addressing SLAM scenarios where ground truth map is inherently sparse compared to the mapped environment. Through systematic analysis of existing evaluation metrics in SLAM applications, we identify their fundamental limitations and establish clear guidelines for consistent map quality assessment. Building upon these insights, we propose a novel Gaussian-approximated Wasserstein distance in voxelized space, enabling two complementary metrics under the same error standard: Voxelized Average Wasserstein Distance (AWD) for global geometric accuracy and Spatial Consistency Score (SCS) for local consistency evaluation. This theoretical foundation leads to significant improvements in both robustness against noise and computational efficiency compared to conventional metrics. Extensive experiments on both simulated and real-world datasets demonstrate that MapEval achieves at least \SI{100}{}-\SI{500}{} times faster while maintaining evaluation integrity. The MapEval library\footnote{\texttt{https://github.com/JokerJohn/Cloud\_Map\_Evaluation}} will be publicly available to promote standardized map evaluation practices in the robotics community.

accuracy, assessment, evaluation, (14 more...)

arXiv.org Artificial Intelligence

2411.17928

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.67)

Add feedback

Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

Li, Cheng-Yi, Chang, Kao-Jung, Yang, Cheng-Fu, Wu, Hsin-Yu, Chen, Wenting, Bansal, Hritik, Chen, Ling, Yang, Yi-Ping, Chen, Yu-Chun, Chen, Shih-Pin, Lirng, Jiing-Feng, Chang, Kai-Wei, Chiou, Shih-Hwa

arXiv.org Artificial IntelligenceJul-2-2024

Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, including (1) data complexity, (2) model capacity, and (3) evaluation metric fidelity, we collected an 18,885 text-scan pairs 3D-BrainCT dataset and applied clinical visual instruction tuning (CVIT) to train BrainGPT models to generate radiology-adherent 3D brain CT reports. Statistically, our BrainGPT scored BLEU-1 = 44.35, BLEU-4 = 20.38, METEOR = 30.13, ROUGE-L = 47.6, and CIDEr-R = 211.77 during internal testing and demonstrated an accuracy of 0.91 in captioning midline shifts on the external validation CQ500 dataset. By further inspecting the captioned report, we reported that the traditional metrics appeared to measure only the surface text similarity and failed to gauge the information density of the diagnostic purpose. To close this gap, we proposed a novel Feature-Oriented Radiology Task Evaluation (FORTE) to estimate the report's clinical relevance (lesion feature and landmarks). Notably, the BrainGPT model scored an average FORTE F1-score of 0.71 (degree=0.661; landmark=0.706; feature=0.693; impression=0.779). To demonstrate that BrainGPT models possess objective readiness to generate human-like radiology reports, we conducted a Turing test that enrolled 11 physician evaluators, and around 74% of the BrainGPT-generated captions were indistinguishable from those written by humans. Our work embodies a holistic framework that showcased the first-hand experience of curating a 3D brain CT dataset, fine-tuning anatomy-sensible language models, and proposing robust radiology evaluation metrics.

braingpt, evaluation, instruction, (13 more...)

arXiv.org Artificial Intelligence

2407.02235

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > Canada (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT

Shakil, Hassan, Mahi, Atqiya Munawara, Nguyen, Phuoc, Ortiz, Zeydy, Mardini, Mamoun T.

arXiv.org Artificial IntelligenceMay-7-2024

In the contemporary era characterized by a deluge of data, the intelligence community faces the challenge of information overload, needing to process vast amounts of information swiftly and effectively. The ability to generate succinct, clear, and actionable summaries from diverse data sources is crucial, as it often determines the success of strategic objectives in this information-rich environment. As the demand for systems capable of automating large-scale text summarization without compromising on quality or relevance intensifies, the role of such technologies becomes increasingly critical Liu and Lapata [2019]. Text summarization, a pivotal task within Natural Language Processing (NLP), has found widespread application across various domains, including news aggregation and the distillation of extensive documents into manageable summaries. The exponential growth in data underscores the utility of text summarization in enhancing content accessibility and comprehension, thus facilitating more efficient navigation through information landscapes Chouikhi and Alsuhaibani [2022].

evaluation, metric, summarization, (16 more...)

arXiv.org Artificial Intelligence

2405.04053

Country:

North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
North America > United States > North Carolina > Wake County > Cary (0.04)
North America > United States > Massachusetts > Middlesex County > Lowell (0.04)
North America > United States > Kansas (0.04)

Genre: Research Report > New Finding (0.69)

Industry:

Government (0.49)
Media > News (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.43)

Add feedback

Psychological Metrics for Dialog System Evaluation

Giorgi, Salvatore, Havaldar, Shreya, Ahmed, Farhan, Akhtar, Zuhaib, Vaidya, Shalaka, Pan, Gary, Ungar, Lyle H., Schwartz, H. Andrew, Sedoc, Joao

arXiv.org Artificial IntelligenceSep-15-2023

We present metrics for evaluating dialog systems through a psychologically-grounded "human" lens in which conversational agents express a diversity of both states (e.g., emotion) and traits (e.g., personality), just as people do. We present five interpretable metrics from established psychology that are fundamental to human communication and relationships: emotional entropy, linguistic style and emotion matching, agreeableness, and empathy. These metrics can be applied (1) across dialogs and (2) on turns within dialogs. The psychological metrics are compared against seven state-of-the-art traditional metrics (e.g., BARTScore and BLEURT) on seven standard dialog system data sets. We also introduce a novel data set, the Three Bot Dialog Evaluation Corpus, which consists of annotated conversations from ChatGPT, GPT-3, and BlenderBot. We demonstrate that our proposed metrics offer novel information; they are uncorrelated with traditional metrics, can be used to meaningfully compare dialog systems, and lead to increased accuracy (beyond existing traditional metrics) in predicting crowd-sourced dialog judgements. The interpretability and unique signal of our psychological metrics make them a valuable tool for evaluating and improving dialog systems.

metric, psychological metric, traditional metric, (12 more...)

arXiv.org Artificial Intelligence

2305.14757

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report > New Finding (0.71)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

CX Metrics in the Age of AI

#artificialintelligenceMar-13-2019, 16:04:31 GMT

We're all familiar with traditional measurement of contact center operations; they've been part of the industry for decades. With the growth of the Internet, websites, self-service, mobile devices, mobile apps, social media and changes to consumer behaviors in the "always on" world, the industry has been in a constant state of reinvention given the value of data in reducing operational costs, while improving service. Chatbots – voice activated assistants – Natural Language Processing NLP – and Artificial Intelligence (AI) are changing the way we measure CX once again, and at a scale few could have imagined only a few years ago. Before we get into these shifts, let's look at the most traditional metrics, which are still important today, but not as granular, not as vast, and not as near-real-time as the latest software and cloud advancements make possible. These metrics will never go away, but are they enough?

artificial intelligence, contact center, natural language, (15 more...)

#artificialintelligence

Country: North America > United States > Iowa (0.15)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback